Fault-Tolerance Implementation in Typical Distributed Stream Processing Systems

نویسندگان

  • Wu-Hong Chen
  • Jichiang Tsai
چکیده

Typical training simulation systems adopt distributed network architecture designs composed of personal computers because of cost, extensibility, and maintenance considerations. In this design, the functions of the entire system are easily affected by failures or errors from any computer during operation. Thus, adopting appropriate fault-tolerance processing mechanisms to ensure that the normal operation and functions of the entire system can be maintained when irregularities occur in a subsystem computer is an important consideration for typical training simulation system design. Since firearms training simulation system operations involve the transmission and processing of substantial amounts of streaming data, these can be considered typical distributed stream processing systems. In this paper, we examined typical distributed stream processing fault-tolerance mechanism designs and technique. We applied this technique to a typical firearms training simulation system to increase the operation reliability and availability. We used the transparent checkpoint method to implement the fault-tolerance mechanism processing program. The results of single-machine fault-tolerance mechanism tests and multi-machine synchronized fault-tolerance mechanism tests indicate that the performance of the checkpoint establishment and rollback recovery time can satisfy the system operation requirements.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward High-Performance Distributed Stream Processing via Approximate Fault Tolerance

Fault tolerance is critical for distributed stream processing systems, yet achieving error-free fault tolerance often incurs substantial performance overhead. We present AF-Stream, a distributed stream processing system that addresses the trade-off between performance and accuracy in fault tolerance. AF-Stream builds on a notion called approximate fault tolerance, whose idea is to mitigate back...

متن کامل

A Quality-Centric Data Model for Distributed Stream Management Systems

It is challenging for large-scale stream management systems to return always perfect results when processing data streams originating from distributed sources. Data sources and intermediate processing nodes may fail during the lifetime of a stream query. In addition, individual nodes may become overloaded due to processing demands. In practice, users have to accept incomplete or inaccurate quer...

متن کامل

When Stream Processing crosses MapReduce

Although Event Stream Processing (ESP) systems exit for already more than a decade, we recently witness a true renaisance for ESP systems that have adopted the popular MapReduce paradigm. In this white paper, we advocate for the StreamMapReduce approach as it allows a (i) quick and easy transition of legacy MapReduce-based applications to ESP, (ii) simplifies the implementation of fault toleran...

متن کامل

Fault-tolerant stream processing using a distributed, replicated file system

We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal stream processing and leaves more resources available for normal stream processing than previous proposals. Like several previous schemes, SGuard is based on rollback recovery [18]: it checkpoints the state of stream pr...

متن کامل

Fault tolerance for stream programs on parallel platforms

A distributed system is defined as a collection of autonomous computers connected by a network, and with the appropriate distributed software for the system to be seen by users as a single entity capable of providing computing facilities. Distributed systems with centralised control have a distinguished control node, called leader node. The main role of a leader node is to distribute and manage...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Inf. Sci. Eng.

دوره 30  شماره 

صفحات  -

تاریخ انتشار 2014